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Abstract 

A central problem in differentially private data analysis is how to design efficient algorithms 
capable of answering large numbers of counting queries on a sensitive database. Counting 
queries of the form "What fraction of individual records in the database satisfy the property 
<7?" We prove that if one-way functions exist, then there is no algorithm that takes as input a 
database D £ ({0, l} d ) n , and k = 8(n 2 ) arbitrary efficiently computable counting queries, runs 
in time poly(<i, n), and returns an approximate answer to each query, while satisfying differential 
privacy. We also consider the complexity of answering "simple" counting queries, and make some 
progress in this direction by showing that the above result holds even when we require that the 
queries are computable by constant depth (AC ) circuits. 

Our result is almost tight in the sense that nearly n 2 counting queries can be answered 
efficiently while satisfying differential privacy. Moreover, super-polynomially many queries can 
be answered in exponential time. 

We prove our results by extending the connection between differentially private counting 
query release and cryptographic traitor-tracing schemes to the setting where the queries are 
given to the sanitizer as input, and by constructing a traitor-tracing scheme that is secure in 
this setting. 

1 Introduction 

Consider a database D 6 ({0, l} d ) n , in which each of the n rows corresponds to an individual's 
record, and each record consists of d binary attributes. The goal of privacy-preserving data analysis 
is to enable rich statistical analyses on the database while protecting the privacy of the individu- 
als. It is especially desirable to preserve differential privacy [DMNS06] , which guarantees that no 
individual's data has a significant influence on the information released about the database. 

Some of the most basic statistics on a database are counting queries, which are queries of the 
form, "What fraction of individual records in D satisfy some property g?" In particular we would 
like a differentially private sanitizer that, given a database D and k counting queries qx,...,qi- from 
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the family Q, outputs an approximate answer to each of the queries. We would like the number of 
queries, k, to be as large as possible, and the set of feasible queries, Q, to be as general as possible. 
Ideally, Q, would contain all counting queries. 1 Moreover, we would like the algorithm to run as 
efficiently as possible. 

Early work in differential privacy [DN03, BDMN05, DMNS06] gave an efficient sanitizer — the 
so-called Laplace Mechanism. The Laplace Mechanism answers any set of k arbitrary efficiently 
computable counting queries by perturbing the answers with appropriately calibrated noise, proving 
good accuracy (say, within ±.01 of the true answer) as long as k < n 2 . 

The ability to approximately answer n 2 counting queries is indeed quite powerful, especially 
in settings where data is abundant and n is large. However, being limited to n 2 queries can be 
restrictive in settings where data is expensive or otherwise difficult to acquire, and n is small. It 
can also be restrictive when the budget of queries is shared between multiple analysts. Fortunately, 
a remarkable result of Blum et al. [BLR08] (with subsequent developments in [DNR + 09, DRV10, 
HLM10]), showed that differentially private algorithms are not limited to n 2 queries. They showed 
how to approximately answer arbitrary counting queries even when k is exponentially larger than 
n. Unfortunately, their algorithm, and all subsequent algorithms capable of answering more than 
n 2 arbitrary counting queries, run in time (at least) poly(2 d ,n, k). 

The result of Blum et al., raises the exciting possibility of an efficient algorithm that can pri- 
vately compute approximate answers to large numbers of counting queries. Unfortunately, Dwork 
et al. [DNR + 09] gave evidence that efficient sanitizers are inherently less powerful than their com- 
putationally unbounded counterparts. They consider the related problem of counting query release. 
In the counting query release problem, the goal is to produce a summary from which approximate 
answers to every query in Q can be computed, and we'd like the sanitizer and the summary both 
to run in time much less than the size of Q. In this setting, Dwork et al. constructed a family of 
roue hly 2V 7 " queries that, under certain cryptographic assumptions, cannot be released efficiently 
(in time poly(d, n)), even though any family of size at most ~ 2 n can be released by a computational 
unbounded algorithm. For any family Q, efficiently solving the counting query release problem for 
Q implies an efficient sanitizer for any polynomial number of queries from Q. (See the related work 
for more discussion of this relationship.) Thus, hardness results for counting query release rule out 
a particular way of constructing efficient sanitizers. However, ultimately an analyst will only be 
able to ask a polynomial number of queries, and impossibility results for counting query release 
still leave room for optimism that there might be an efficient sanitizer that can answer many more 
arbitrary counting queries than the Laplace Mechanism. 

Unfortunately, we show that this is not the case. We show that there is no efficient, differentially 
private algorithm that takes a database D € ({0, l} d ) n , and 0(n 2 ) arbitrary, efficiently computable 
counting queries as input and outputs an approximate answer to each of the queries. One way to 
summarize our results is that, unless we restrict the set of allowable queries Q, or allow exponential 
running time, then the Laplace Mechanism is essentially the best possible algorithm for answering 
counting queries. 

1 It may require super-polynomial time just to evaluate an arbitrary counting query, which would rule out efficiency 
for reasons that have nothing to do with privacy. For this discussion, we will always assume that the queries are 
efficiently computable, and are not the bottleneck in the computation. 
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1.1 Our Results and Techniques 

In this paper we give new hardness results for answering counting queries while satisfying differential 
privacy. To make the statement of our results more concrete, we will assume that the counting 
queries are given to the sanitizer as input in the form of circuits that, on input an individual record 
x G {0, l} d , decide whether or not the record x satisfies the property q. We say the queries are 
efficiently computable if the corresponding circuits are of size poly(d, n). 

Theorem 1.1. Assuming the existence of one-way functions, there is no algorithm that, on input a 
database D G ({0, l} d ) n and Q(n 2 ) efficiently computable counting queries, runs in time poly(d, n) 
and returns an approximate answer to each query to within ±.49, while satisfying differential pri- 
vacy. 

We also show that, that the same theorem holds even for queries that are computable by 
unbounded-fan-in circuits of depth-6 over the basis {A, V, ->} (a subset of the well-studied class 
AC ), albeit under a strong (but still plausible) cryptographic assumptions. 

Theorem 1.2. Under the assumptions described in Section 5.2), there is no algorithm that, on 
input a database D G ({0, 1} ) n and 0(n 2 ) efficiently computable depth-6 queries (circuits), runs in 
time poly(d, n) and returns an approximate answer to each query to within ±.49, while satisfying 
differential privacy. 

We now describe the techniques required to prove our results. 

The Connection with Traitor- Tracing We prove our results by building on the connection be- 
tween sanitizers for counting queries and traitor-tracing schemes utilized by Dwork et al. [DNR + 09]. 
Traitor-tracing schemes were introduced by Chor, Fiat, and Naor [CFN94] for the purpose of iden- 
tifying pirates who violate copyright restrictions. Roughly speaking, a (fully collusion-resilient) 
traitor-tracing scheme allows a sender to generate keys for n users so that 1) the sender can broad- 
cast encrypted messages that can be decrypted by any user 2) any efficient pirate decoder capable 
of decrypting messages can be traced to at least one of the users who contributed a key to it, even 
if an arbitrary coalition of the users got together to contribute their keys. 

Dwork et al. show that the existence of traitor-tracing schemes implies hardness results for the 
counting query release problem. Very informally, their argument is as follows: Suppose a coalition 
of users takes their keys and builds a database D G ({0, l} d ) n where each record contains one of 
their user keys. The family Q will contain a query q c for each possible ciphertext c. The query will 
ask "What fraction of the records (user keys) in D will decrypt the ciphertext c to the message 1?" 
Every user can decrypt, so if the sender encrypts a (single-bit) message m as a ciphertext c, then 
every user will decrypt c to m. Thus the answer to the counting query, q c , will be m. 

Suppose there were an efficient algorithm that could release the family Q. Then the coalition 
could use it to efficiently produce a summary of the database D that enables them to efficiently 
compute an approximate answer to every query q c , which would also allow them to efficiently 
decrypt the message. Such a summary will be an efficient pirate decoder, and thus the tracing 
algorithm can use its answers to identify one of the users in the coalition. However, if there is a 
way to identify one of the users in the database from the summary, then the summary cannot be 
differentially private. 

In order to instantiate their result, they need a traitor-tracing scheme. Since Q contains a 
query for every ciphertext, the parameter to optimize is the length of the ciphertexts. Using the 
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fully collusion-resilient traitor-tracing scheme of Boneh, Sahai, and Waters [BSW06], which has 
ciphertexts of length ~ y/n, they obtain a family of queries of size ~ 2^™ that cannot be released 
efficiently. 

Our Approach In our setting, we don't expect to answer every query in Q, only the set of 
k <C \Q\ input queries. At first glance, this should make answering the queries much easier, and 
thus make it more difficult to demonstrate hardness. However, the attacker does have the power to 
choose the queries he wants answers to, and can choose just the queries that are most harmful to 
privacy. Our first observation is that in the traitor-tracing scenario, the tracing algorithms will only 
query the pirate decoder on a polynomial number of ciphertexts, which will be randomly chosen 
and depend on the particular keys that were instantiated for the scheme. For many schemes, even 
0(n 2 ) queries is sufficient. Thus it would seem that the tracing algorithm could simply decide 
which queries it will make, give those queries as input to the sanitizer, and then use the answers 
to those queries to identify a user and violate differential privacy. 

Although this observation would seem to be sufficient to establish our main result, the intuition 
we sketched ignored an important issue. Many traitor-tracing schemes (including that of [BSW06]) 
are only able to trace stateless pirate decoders, which essentially commit to a response to each 
possible query (or a distribution over responses) once and for all. For the counting query release 
problem, the private summary is necessarily stateless, and thus the result of Dwork et al. can 
be instantiated with any scheme that allows tracing of stateless pirate decoders. However, the 
type of sanitizer might give answers that depend on the whole sequence of queries, and is given 
all the queries it will have to answer at once. Thus, in order to prove our results, we will need 
a traitor-tracing scheme that can trace stateful pirate decoders, and selects all its queries to the 
pirate decoder at once. 

The problem of tracing stateful pirates is quite natural even without the implications for private 
data analysis. To be sure, this problem has been studied in the literature, originally by Kiayias 
and Yung [KY01]. However, their solution, and all others known, does not apply to our specific 
setting (see discussion below). However, we also refine the basic connection between traitor-tracing 
schemes and differential privacy by showing that, in many respects, fairly weak traitor-tracing 
schemes suffice to establish the hardness of preserving privacy. In particular, although the pirate 
decoder may be stateful, it will be constrained in other ways, which will make it easier to construct 
a traitor-tracing scheme for these kinds of pirates. Indeed, we construct such a scheme to establish 
Theorem 1.1. The scheme will also have weakened requirements in other respects, having nothing 
to do with the statefulness of the pirate. These weakened requirements allow us to reduce the 
complexity of the decryption, which means that the queries used by the attacker do not need to be 
arbitrary polynomial-size circuits, but instead can be circuits of constant-depth, which will establish 
Theorem 1.2. See Sections 3.1 and 4 for a precise statement of the kind of traitor-tracing scheme 
that will suffice and Section 5 for our construction. 

1.2 Related Work 

Traitor- Tracing Schemes Chor, Fiat, and Naor [CFN94] introduced the notion of a traitor- 
tracing scheme, and they have been studied extensively as a means of distributing copyrighted 
content. The connection between traitor-tracing schemes and hardness results for differentially 
private sanitizers (discussed above) was discovered by Dwork et al. [DNR+09]. The literature 
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on traitor-tracing schemes is too vast to summarize here, and much of it focuses on constructing 
schemes with properties that are not relevant to our application. 

The work that is most related to ours is that of Kiayias and Yung [KY01], which considers 
traitor-tracing schemes that can trace stateful pirate decoders. However, their construction relies 
on a certain "watermarking assumption" that makes sense in the context of distributing copyrighted 
content, but does not apply when the messages are single bits, as is the case in our application. 
However, our scheme relies on stronger assumptions about the pirate decoder (that only make 
sense in the context of hardness results for differential privacy) than their does, making our results 
incomparable to theirs. 

Fingerprinting codes — an ingredient of our traitor-tracing scheme — were introduced by Boneh 
and Shaw [BS98], also for the problem of watermarking copyrighted content. Fingerprinting codes 
have been used extensively in constructions of traitor-tracing schemes (cf . Boneh and Naor [BN08] ) . 
To achieve the best parameters for our scheme, we use a construction of fingerprinting codes of 
optimal length, due to Tardos [Tar08]. 

The Relationship with [DNV12] Dwork, Naor, and Vadhan [DNV12] gave information the- 
oretic lower bounds for stateless sanitizers. These are sanitizers that take k queries as input, but 
whose answers to each query do not depend on the other k — 1 input queries. They showed that 
(even computationally unbounded) stateless sanitizers can answer at most ~ n 2 queries with non- 
trivial accuracy, while satisfying differential privacy. The Laplace Mechanism is a stateless sanitizer 
that answers ~ n 2 queries, and thus their result is tight in this respect. 

Another interpretation of our results, that would lead to an alternative proof of our results, is 
that we construct a family of queries for which "keeping state doesn't help" . Unfortunately, the use 
of traitor-tracing schemes and fingerprinting codes in our proofs obscures the intuitive relationship 
between our arguments and those in [DNV12], so we will give an informal discussion here. 

They consider a game where an ra-row database is chosen at random, and a random subset of 
n — 1 of those rows is given to the attacker. The attacker wants to violate privacy by recovering 
the n-th row. To do so, the attacker chooses ~ n 2 queries (randomly, from a distribution that 
depends on the n— 1 known rows) and requests answers to these queries. Using these answers, they 
show that there is a particular way for the attacker to (randomly) guess the missing row, that will 
succeed with sufficient probability to constitute a privacy violation. Their argument is in two steps: 
1) The expected correlation between the answers given by a stateless sanitizer and the answers on 
a database consisting only of the missing row is significant. 2) A stateless sanitizer cannot give 
answers that are correlated with too many rows that are not in the database. Combining these two 
steps shows that the attacker has a significant chance of identifying the n-th row from its correlation 
with the answers. 

Typically, the intuition behind the analysis of traitor-tracing schemes follows roughly the same 
two steps: 1) There will be some correlation between the decryptions returned by the efficient pirate 
and the decryptions that would be returned by some member of the coalition (using only his own 
key). 2) There will not be significant correlation between the decryptions returned by the efficient 
pirate and the decryptions that would be returned by any user not a member of the coalition. 
Indeed, if we "unrolled" the analysis of the fingerprinting code directly into our construction, we 
would make exactly the same arguments. 



5 



Other Types of Sanitizers There are two other variants of the counting query problem that 
have appeared in the literature. The first, which we have already discussed, is counting query 
release. The second, is interactive sanitization. This problem is the same as the one we consider, 
where the sanitizer is given a database D and k queries from a family Q, however the queries 
arrive one at a time, and may be chosen adaptively. In this setting, we want the sanitizer to 
answer each query efficiently (in time polynomial in d, n, and k). The Laplace Mechanism is, 
in fact, interactive, and a line of work initiated by Roth and Roughgarden as well as Hardt and 
Rothblum [RR10, HR10, GRU12] showed how to interactively answer nearly 2 n queries in time 
poly(2 d , n) per query. 

The three variants we've described satisfy some interesting relationships. First, if we have an 
algorithm that runs in time T and releases a summary that enables an analyst to answer any query 
in Q in time T, then we also have an interactive sanitizer that runs in time T per query that 
answers any k queries from Q. Secondly, if we have an interactive sanitizer that answers k queries 
from Q in time T per query, then we also have a non-interactive sanitizer that answers K queries 
from Q in time Tk. Thus, holding Q fixed (and assuming k = poly(d, n)), the problem we consider 
is the easiest form of private counting query release, and the lower bounds we prove imply lower 
bounds for the other variants. 

For the case of interactive sanitization, these lower bounds are new. To our knowledge, prior 
to our work it was possible that there was an interactive sanitizer that ran in time poly(d, n) per 
query and answered nearly 2 n arbitrary counting queries, whereas our results show that there is no 
efficient interactive sanitizer for significantly more than n 2 queries. On the other hand, for counting 
query release, our results only show that it is hard to release a particular family of queries Q whose 
size is at least 2 n . For families of queries this large, the results of Dinur and Nissim [DN03] already 
imply the impossibility of release, even by computationally unbounded algorithms. 

We note that typically Q is not the same in the data release problem as it is for sanitizers. 
For data release, Q must be smaller than 2 n , and thus cannot contain all efficiently computable 
counting queries. Sanitizers (both interactive and non-interactive) are supposed to circumvent this 
problem by allowing the queries to be arbitrary, but only answering the k queries that are needed. 
However our results show that they can only circumvent the problem if k <C n 2 queries is sufficient. 

Hardness Results for Synthetic Data There has been considerable focus in differentially pri- 
vate data analysis on sanitizers that produce synthetic data [BCD+07, BLR08, DNR+09, DRV10]. 
A sanitizer outputs synthetic data if on input a database D £ ({0, 1} ) n , it outputs a new database 
D € ({0, l} d ) n that approximately preserves the answer to each of some set of queries. In addition 
to being a natural and well-studied desideratum for private data analysis, essentially all known tech- 
niques for answering 3> n 2 queries (from sufficiently general families Q) output synthetic data. Even 
many constructions of interactive sanitizers for answering 3> n 2 queries [RR10, HR10, GRU12], can 
easily be modified to output a synthetic database. However, even the best of these mechanisms run 
in time poly(2 rf , n, k). 

Unfortunately, Ullman and Vadhan [UV11], building on work by Dwork et al. [DNR+09] showed 
that exponential running time is inherent for sanitizers that output synthetic data, even if the 
synthetic database only has to preserve the answers to 2-way marginals (roughly, the mean of 
each column and the pairwise correlation between columns). These results apply even when the 
number of queries is <C n 2 , and thus apply to problems where efficient algorithms that do not 
output synthetic data (e.g. the Laplace mechanism) are known. Thus, these results say more about 
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the hardness of generating synthetic data, and the limitations of current techniques, than they do 
about the hardness of answering large numbers of counting queries. 

Answering Simple Counting Queries There has been a significant body of research on de- 
signing improved algorithms for releasing "simple" families of queries (which, as discussed above, 
implies interactive and non-interactive sanitizers for these families of queries). For instance, Hardt, 
Rothblum, and Servedio [HRS12] as well as Thaler, Ullman, and Vadhan [TUV12] recently gave 
algorithms for releasing the family of monotone k-w&y conjunctions. A monotone fc-way conjunc- 
tion is specified by a subset of the columns, S C [d], \S\ = k, and asks "What fraction of records 
in D have a 1 in every column in 57" Note that there are ~ d k such queries (for k <C d). These 
queries have been identified as an especially important family for differentially private data release 
(cf. [BCD+07, KRSU10, GHRU11]) The two works mentioned give efficient algorithms capable of 
releasing all monotone k-w&y conjunctions on a database of size d°(^\ and thus are capable of 
answering j"j n (v«) ^> n 2 q Ue ries from this family (for a particular choice of n). 

Thus there is a significant gap between k-way conjunctions, for which there are efficient, non- 
trivial improvements on the Laplace Mechanism, and polynomial-size depth-6 circuits, for which 
we can show there is no efficient algorithm that significantly improves on the Laplace Mechanism. 

2 Preliminaries 

Differentially Private Algorithms Let a database D G ({0, l} d ) n be a collection of n rows 
(x (1) , . . . , G {0, l} d . We say that two databases D, D' G ({0, l} d ) n are adjacent if they differ 
only on a single row, and we denote this by D ~ D' . Let Ai : ({0, l} d ) n — > 1Z be a randomized 
algorithm that takes a database as input. For ease of notation, we will write Ai as a function on 
databases in ({0, 1} ) n , but we will always think of the sanitizer as taking databases of arbitrary 
dimensions as input, and thus the parameters in various definitions may depend on the dimensions 
of the input. 

Definition 2.1 (Differential Privacy [DMNS06]). An algorithm Ai is (e, 5) -differentially private if 
for every two adjacent databases D ~ D' G ({0, l} d ) n and every subset S C 1Z, 

Pr [M(D) G 5] < e E Pr [M(D') G S] + 6. 

If Ai is (e, <5)-differentiaily private for some functions e = e(n) = 0(1), 5 = 6(n) = o(l/n), we will 
drop the parameters e and 5 and say that Ai is differentially private. 

The choice of e = 0(1),6 = o(l/n) is a fairly weak setting of the privacy parameters, and 
most known constructions of differentially private mechanisms for various tasks give quantitatively 
stronger privacy guarantees. These parameters are essentially the weakest parameters possible, as 
(e, ^-differentially privacy is not a meaningful privacy guarantee for e = w(l) or 5 = f2(l/n). The 
fact that our lower bounds apply to these parameters makes our results stronger. 

Sanitizers for Counting Queries Since an algorithm that always outputs _L satisfies Defini- 
tion 2.1, we also need to specify what it means for the sanitizer to be useful. In this paper we study 
sanitizers that give accurate answers to counting queries. A counting query on {0, 1} 6 * is defined 
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by a predicate q: {0, l} d — > {0, 1}. Abusing notation, we define the evaluation of the query q on a 
database D e ({0, \} d ) n to be 

n 

We will use to denote a set of counting queries on {0, l} d and Q = UdeN 2 ■ 

We are interested in sanitizers that take a sequence of queries from some set Q as input, which 
we call generic sanitizers. Formally a generic sanitizer is a function M : ({0, l} d ) n x (Q^) k -> M fc . 
Notice that we assume that A4 outputs k real-valued answers. Think of the j-th component of the 
output of A4 as an answer to the j-th query. For the results in this paper, this assumption will be 
without loss of generality. 2 Again, for ease of notation, we will write A4 as a function on k queries 
in Q^ d \ but we will always think of the sanitizer as taking a database of arbitrary dimensions 
n x d and an arbitrary number of queries from as input. Thus the parameters in various 

definitions may depend on the dimensions of the input and the number of queries. Definition 2.1 
extends naturally to generic sanitizers by requiring that for every gi,...,^ G Q, the sanitizer 
■M.qi,...,g k (~) = q±,..., qk) is (e, ^-differentially private as a function of the input database. 
Now we formally define what it means for a generic sanitizer to give accurate answers. 

Definition 2.2 (Accuracy). Let D be a database and Qi,.--,Qk be a set of counting queries. A 
sequence of answers a±, . . . , is a-accurate for D and q\, . . . , q% if 

max\qj(D) — aj\ < a. 
je[fc] 

Let Q be a set of counting queries, k £ N and a, (3 E [0, 1] be parameters. A generic sanitizer 
M. is (a, /3, Q, k)- accurate if for every database D G ({0, l} d ) n and every sequence of queries 
qi,...,q k £Q (d) 

Pr [A4(D, qi, . . . , qk) is a-accurate for D and qi, ...,</&]> 1 — /3. 

.M's coins 

If M is (a,/3, Q, /c)-accurate for any (constant) a < 1/2 and /3 = /3(n) = o(l/n 2 ), we will drop a 
and /? and say that M. is (Q, A;)-accurate. 

The choice of a < 1/2, /3 = o(l/n 2 ) is also significantly weaker than what can be achieved 
by known constructions of generic sanitizers. These parameters are also essentially the weakest 
parameters possible, as a mechanism that answers 1/2 to every query achieves a = 1/2, (3 = 
for any number of arbitrary queries. Also, if there is a mechanism that achieves (< 1/2, (3, Q, k)- 
accuracy for (3 > 1/2, then there is another mechanism that achieves (< 1/2, o(l/n 2 ), Q, /c)-accuracy 
with only a small loss in the privacy parameters and the efficiency of the mechanism. The fact that 
our lower bound applies to these parameters makes our results stronger. 

2 A certain settings, M(D,qi, . . . ,qk) is allowed to output a "summary" z £ R for some range TZ. In this case we 
would also require that there exists an "evaluator" £ : TZ x Q — > K that takes a summary and a query and returns 
an answer £ (z, q) = a that approximates q(D). The extra generality is used to allow M to run in less time than the 
number of queries its answering (e.g. releasing a fixed family of queries) , but we can ignore this issue for our range of 
parameters. A generic sanitizer, A4 that outputs a summary in TZ can be converted to a generic sanitizer with output 
in R fc simply by running M{D, qi, ... , qk) to obtain z G TZ and then computing ai = £(z, qi), . . . = £(z, qk). This 
conversion increases the running time by a factor of k, which is the minimum time required to read the input queries. 
Thus, out assumption is without loss of generality. 
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Efficiency of Generic Sanitizers Simply, a generic sanitizer is efficient if it runs in time poly- 
nomial in the length of its input. To make the statement more precise, we need to specify how the 
queries are given to the sanitizer as input. 

Notice that to specify an arbitrary counting query q: {0, l} d — > {0, 1} requires 2 d bits and thus 
it may require time 2 d to evaluate. In this case, a sanitizer whose running time is polynomial in 
the time required to evaluate the query is not especially efficient; that is, it's running time is not 
poly(d, n, k). Thus, we restrict attention to to queries that are efficiently computable, and have a 
succinct representation. In this work, we will fix the representation to be Boolean circuits over the 
basis {A, V, ->} with possibly unbounded-fan-in. In this representation, a query whose description 
length is s can also be evaluated in time poly(s). 

For a function s : N — > N, we use to denote the set of all counting queries on {0, l} d specified 
by circuits of size s(d). In this work we also consider the case where the queries are computable 
by circuits of low depth. Similarly, for functions s, h: N — > N,we use Q.fl to denote the set of all 

counting queries on {0, l} d specified by circuits of size s(d) and depth h(d). Finally, we use 
to denote the set of all counting queries on {0, l} d . 

Definition 2.3 (Efficient Generic Sanitizers). A generic sanitizer Ai is efficient if, on input a 
database D G ({0, l} d ) n and k queries gi, ••■,<% G Qs , M runs in time poly(d, n, k, s(d)). A 
generic sanitizer Ai is efficient for depth-h queries if, on input a database D G ({0, l} d ) n and k 
queries q\, . . . , qk G Q?fl, Ai runs in time poly(d, n, k, s(d)). 

For comparison with our results, we will recall the properties of some known mechanisms: 

Theorem 2.4 (Laplace Mechanism [DN03, DMNS06]). There exists a generic sanitizer Ai i_ ap that 
is 1) differentially private, 2) efficient, and 3) (Sin ,Q(n 2 ))- accurate. 

Note that since this mechanism is efficient, it is also efficient for depth-/i queries. 

Theorem 2.5 ("Advanced Query Release Mechanisms" [BLR08, DNR+09, DRV10, HR10, HLM10]). 
There exists a generic sanitizer A^Adv that is 1) differentially private and 2) (63M , 2^( n / v ^)- 
accurate. For queries in Qs , Aihdv runs in time poly(2 d , n, k, s(d)). 

As we mentioned above, these mechanisms can achieve stronger quantitative privacy and accu- 
racy guarantees (in terms of e, 5 for privacy and a, f3 for accuracy) with only a small degradation in 
the number of queries. However, we state the results for our relaxed choice of parameters both for 
simplicity and for comparison with our negative results. Also, notice that both of these mechanisms 
provide accuracy guarantees that are independent of the complexity of the queries (although the 
running time of the mechanism will depend on the complexity of the queries) . Our hardness results 
will apply to sanitizers that only provide accuracy for queries of size poly(d, n). 

3 Traitor-Tracing Schemes and Other Cryptographic Primatives 

In this section we give a definitions of traitor-tracing schemes and of the other cryptographic 
primitives that will be useful in proving our result. For clarity, we will use Att> and App to 
denote algorithms associated with traitor-tracing schemes and fingerprinting codes, respectively. 
Algorithms associated with encryption schemes will typically have no subscript, however we will 
use aEnc for their associated parameters. 
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3.1 Traitor-Tracing Schemes 

We now give a definition of a traitor-tracing scheme, heavily tailored to the task of proving hardness 
results for generic sanitizers. We will sacrifice some consistency with the standard definitions. See 
below for further discussion of the ways in which our definition departs from the standard definition 
of traitor-tracing. In some cases, the non-standard aspects of the definition will be necessary to 
establish our results, and in others it will be for convenience. Despite these differences, we will 
henceforth refer to schemes satisfying our definition simply as traitor-tracing schemes. 

Intuitively, a traitor-tracing scheme is a form of broadcast encryption, in which a sender can 
broadcast an encrypted message that can be decrypted by each of a large set of users. The standard 
notion of security for such a scheme would require that an adversary that doesn't have any of the 
keys cannot decrypt the message. A traitor-tracing scheme has the additional property that given 
any decoder capable of decrypting the message (which must "know" some of the keys) , there is a 
procedure for determining which user's key is being used. Moreover, we want the scheme to be 
"collusion resilient", in that even if a coalition of users gets together and combines their keys in 
some way to produce a decoder, there is still a procedure that identifies at least one member of the 
coalition. 

We now describe the syntax of a traitor-tracing scheme more formally. For functions n, kjj : N — > 
N, an (n, /cjj)-traitor-tracing scheme is a tuple of algorithms (Genjj, EncjT; Dec-m Trace-pr)- We 
allow all the algorithms to be randomized except for DecjT- 3 

• The algorithm Genjj takes a security parameter, and returns a set of n = n{n) user keys 
sk = (sfcW, . . . , sk^) G {0, 1} K . Formally, sk = (sfcW, . . . , sfcW) «- R Gen T T(l K ). 

• The algorithm EncjT takes a set of user keys sk and a message bit b G {0, 1}, and generates 
a ciphertext c G C = . Formally, c «— R Encjjisk, b). 

• The algorithm DecjT takes any single user key sk and a ciphertext c G C, runs in time 
poly(ft, n(n)) and deterministically returns a message bit b G {0, 1}. Formally b = Dec-r~r( s ^> c )- 

• The algorithm trace takes a set of user keys sk G ({0, l} K ) n ( K ) as regular input and an 
oracle V: (C( K )) ?CTT ( re ) — >■ {0,l} fcTT ^ K \ makes at most kjj = /cjt( k ) non-adaptive queries 
ci, . . . , Ck TT G to its oracle, and returns the name of a user i G Formally, i i — R 
Tracej T (sfe). 

Intuitively, think of the oracle V as being given some subset of keys sk$ = (sk^)i^s f° r a 
non-empty set S C [n], and TracejT is attempting to identify a user i G S. Clearly, if V ignores 
its input and always returns 0, Tracejj cannot have any hope of success, so we need to place 
some condition on V that allows TracejT to succeed. Roughly, we want to require that TracejT 
is successfully decrypting messages encrypted under the keys sk, however for convenience, we will 
place a stronger requirement on V . Note that making stronger assumptions about V can only help 
the tracing algorithm, so as long as the assumptions are still valid in the intended application, they 
cannot hurt. Specifically, we will place the following requirement on V . 

Definition 3.1 (Available Pirate Decoder). Let IItt = (GenjTi EncjT) DecjT; TracejT) be an 
(n, A;TT)-t ra it° r -t racm g scheme. Let V be a (possibly randomized) algorithm. We say that V is 

3 It would not substantially affect our results if DecjT were randomized, however we will assume that DecjT is 
deterministic for ease of presentation. 
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a kjj-available pirate decoder if for every k£N, every set of user keys sk = (sk^\ . . . ,sk^) G 
{0, 1} K , every S C [n] such that |5| > n — 1, and every c\, . . . , Ck TT G C^ K \ 



Pr 



(pi, ... , b kjJ ) i r ^(s&s, ci, . . . , c fcTT ) 
3j G [fcrr], & G {0, 1} ((Vi G S, Dec TT (^«, Cj ) = b) A (6, ^ 6)) 



< o 



In other words, if every user key sfcW (for i E S) decrypts c to 1 (resp. 0), then V(sks, •) decrypts 
c to 1 (resp. 0). 

We can now define a secure, (n, fc-rT)-t r aitor-tracing scheme: 

Definition 3.2 (Traitor- Tracing Scheme). Let IItt = (Genjj, EncjT) DecjT; Trace-pr) be an (n, kjj)- 
traitor-tracing scheme. Let kjj : N — > N be a function. We say that IItt is a secure (n, kjj)-traitor- 
tracing scheme if for every (possibly randomized) algorithm V that 1) runs in poly(/c, n(n), kjj(n)) 
time and 2) is a &TT-available pirate decoder, and for every S C [n(«)] such that |5| > n(«) — 1, 



Pr 

s k -4— Ge n -rj ( 1 K ) 
T^'s, Traceyy's coins 



TraceTT 



(sk) S 



1 



100n(«0 



Remarks About Our Definition of Traitor- Tracing The traitor-tracing schemes we consider 
are somewhat different than those previously studied in the literature. Specifically: 

• Our traitor-tracing schemes are private key in every respect. That is, we do not require the 
encryption or tracing algorithms to use public keys. In the typical application of traitor- 
tracing schemes to content distribution, these would be desirable features, however they are 
not necessary for this application. We take advantage of this relaxation in two ways: 1) Since 
we do not require a public-key cryptosystem, our first result (Theorem 1.1) only needs to 
assume the existence of one-way functions. 2) Since private-key cryptosystems are easier to 
construct, we are able to find a candidate scheme whose decryption can be implemented by 
constant-depth circuits, which we use to instantiate Theorem 1.2. 

• We only require that the tracing algorithm succeeds with probability 1 — o(l/n) = 1 — 
l/poly(«) (in the most common regime where n = n(n) = poly(ft)). Typical traitor-tracing 
schemes would require that the tracing algorithm succeeds with probability 1 — negl(ft). As 
above, we use this extra flexibility to find a candidate scheme whose decryption can be 
implemented by constant-depth circuits. 

• We do not give the pirate decoder access to an encryption oracle. In other words, we do not 
require CPA security. Most traitor-tracing schemes in the literature are public-key, making 
this distinction irrelevant. Here, we only need an encryption scheme that is secure for an a 
priori bounded number of messages. As above, we use this extra flexibility to find a candidate 
scheme whose decryption can be implemented by constant-depth circuits. 

• We do not require that the key generation, encryption, or tracing algorithms to be efficient, as 
the standard definition of differential privacy requires the privacy guarantees to hold for every 
database (which, roughly, will be generated by the key generation algorithm) and every set of 
queries (which, roughly, will correspond to the encryption and tracing algorithm). However, 
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we do require decryption to be efficient, as the queries used in our results must compute 
decryption, and we require that the queries are efficiently computable. In our constructions 
all of these algorithms will be efficient, but we are hopeful that the extra flexibility may be 
useful for proving stronger lower bounds for differentially private data analysis. 

• We allow the pirate decoder to be stateful, but in an unusual way. The pirate is actually 
allowed to see all the queries at once, whereas typically even traitor-tracing schemes for 
stateless adversaries are allowed to query the pirate adaptively, and in this sense our model of 
a stateful pirate is more powerful than what has been previously considered. However, we do 
require (roughly) that if any of the queries are ciphertexts generated by Enc(s/c,6), then the 
pirate decoder answers b to those queries, regardless of the other queries issued. In typical 
applications of traitor-tracing, the pirate could simply answer _L to every query if it detects 
that any of them are malformed. Kiayias and Yung [KY01] addressed this problem in the 
context of content distribution, and showed a generic transformation from a traitor-tracing 
scheme that traces stateless pirates to one that traces stateful pirates (for the more standard 
notion of stateful pirates). Their approach relies on a particular "watermarking assumption" 
that cannot apply to bit-encryption, making their results incomparable to ours. 

3.2 Fingerprinting Codes 

To construct a traitor-tracing scheme that satisfies Definition 3.2, we will make use of (binary) 
fingerprinting codes, introduced by Boneh and Shaw [BS98]. A fingerprinting code is a pair of 
efficient (possibly randomized) algorithms (Genpp, TraceFp) with the following syntax. 

• The algorithm Genpp takes a number of users n as input and outputs a codebook of n code- 
words of length £ FP = £ FP (n), W = . . G {0, 1} £fp . Formally W 4 — ^ Gen FP (l n ). 
We will typically treat W G {0, l} nx ^ Fp as a matrix with each row containing a codeword. 

• The algorithm Tracepp takes an n-user codebook W and a word w' G {0, 1}^ FP and returns 
an index i G [n]. Formally, i = Tracepp (W, w'). 

Given a non-empty subset S C [n] and a set of codewords Ws = (w^ l ')ieS G {0, 1}^ FP , we define 
the set of feasible codewords to be 

F(W S ) = |V | Vj G [£ FP ] 3ieS w'j = wf} . 

Informally, for every w' G F(Ws), every bit w'j appears somewhere in the j-th column of Ws when 
viewed matrix with a codeword in each row. 

We define the security of a fingerprinting code (Genpp, Tracepp) as follows: 

Definition 3.3 (Secure Fingerprinting Code). Let e F p : N — > [0, 1] and ^ F p : N — > N be functions. 
A pair of algorithms (Genpp, Tracepp) is a (e F p, ^ F p) -fingerprinting code if Genpp(l n ) outputs a 
codebook W G {0, l} nx ^Fp(™) ) and furthermore, for every (possibly inefficient) algorithm .Afp, and 
every non-empty S C [n], 

Pr \w' G F(W S ) A Tracepp (W,w') S] < e FP (n) 

W<- R Gen FP (l«) L 
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Tardos [Tar08] gave a construction of fingerprinting codes of essentially optimal length, improv- 
ing on the original construction of Boneh and Shaw [BS98]. 

Theorem 3.4 ([Tar08]). For every function £fp ■ N — > [0, 1], there exists an (efp, 100n 2 log(n/eFp)))- 
finger 'printing code. In particular, there exists a ( 1 /n 3 , 400re 2 log n) -fingerprinting code. 

3.3 Encryption Schemes 

We will build our traitor-tracing scheme from a suitable encryption scheme. An encryption scheme 
is tuple of efficient algorithms (Gen, Enc, Dec). All the algorithms may be randomized except for 
Dec. The scheme has the following syntactic properties: 

• The algorithm Gen takes a security parameter k, runs in time poly(n), and returns a private 
key sk G {0, 1} K . Formally sk R Gen(l K ). 

• The algorithm Enc takes a private key and a message bit b G {0, 1} and generates a ciphertext 
c G C = C^ K \ Formally, c <— R Enc(sk, b). 

• The algorithm Dec takes a private key sk G {0, 1} K and a ciphertext c G C^ K \ runs in time 
poly(ft), and returns a message bit b. 

First we define (perfectly) correct decryption 4 

Definition 3.5 (Correctness). An encryption scheme (Gen, Enc, Dec) is (perfectly) correct if for 
every b G {0, 1}, and every kGN, 



We require that our schemes have the following A;Enc- m essage security property. 

Definition 3.6 (Security of Encryption). Let ^Enc ■ N — > [0, 1] and k^nc, Tehc : N — > N be functions. 
An encryption scheme I^nc = (Gen, Enc, Dec) is an (eEno %nc; T^^-secure encryption scheme if for 
every T Enc (K, &Enc(«))-time algorithm „4 E nc and every b = (6i, . . . , b kEnc ), b' = (b' x , . . . , 6^) G {0, 1} 

(for fe En c = feEnc(«)), 



where ci, . . . , c& Enc are chosen randomly as Enc(s/c, 6i), . . . , Enc(sfc, &fe Enc ) and c' 1? . . . , c' fcE are chosen 
randomly as Enc(s/c, 6^), . . . , Enc(sfc, ) 

Notice that we do not require IlEnc to be secure against adversaries that are given Enc(sfc, •) as 
an oracle. That is, we do not require CPA security. 

Definition 3.7 (Weak Encryption Scheme). A tuple of algorithms nE nc = (Gen, Enc, Dec) is an 
(eEno ^Enc, TEnc)-encryption scheme if it satisfies correctness and (eEno &Eno TE nc )-security. 

4 It would not substantially affect our results if Dec were allowed to fail with negligible probability, however we 
will assume perfect correctness for ease of presentation. 



Pr 

sfc-<— j^Gen(l K ) 
c< — p^Enc(sfc,6) 



[Dec(s/c, c 



■) = b] = 1. 




[^Enc(c'i,...,4 Enc ) 



l] < £Enc(«) 
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3.4 Decryption Function Families 

For Theorem 1.2, we want to consider encryption and traitor-tracing schemes where Dec is a 
"simple" function of the user key (for every ciphertext c G C and user key sk G {0, 1} K ). 

Definition 3.8 (Decryption Function Family). Let (Gen, Enc, Dec) be a tuple of algorithms where 
Gen produces keys in {0, 1} K and Enc produce ciphertexts in C = C^ K \ For every c G C, we define 
the c- decryption function f c : {0, 1} K — > {0,1} to be f c (sk) = Dec(sk,c). We define the decryption 
function family T DeCtK = {/c} ceC W- 

For a traitor-tracing scheme (GenjT, EncjT, DecjT, Trace-pr), the family -Foec-n-.K is defined anal- 
ogously. 

In what follows, we will say that IlE nc (resp. IIjj) is an encryption scheme (resp. traitor-tracing 
scheme) with decryption function family Toec,K- 

4 Attacking Efficient Generic Sanitizers 

In this section we will prove our main result, showing that the existence of traitor-tracing schemes 
(in the sense of Definition 3.2) implies that efficient generic sanitizers cannot guarantee differential 
privacy. 

Theorem 4.1 (Attacking Efficient Generic Sanitizers). Assume there exists an (n,kjj) -secure 
traitor-tracing scheme TljT = (GenjTi EncjT, DecjT; TracejT) with the decryption function family 
•^Dec-n-.K- Let Q = [J deN be any query set such that TDe^d Q f or every d G N. Then there 
does not exist any sanitizer M. that is simultaneously 1) differentially private, 2) efficient, and 3) 
(Q, kjj(d))- accurate. 

In the typical setting of parameters, u(k) = poly(«), /cjt(^) = @(n 2 ), and decryption can be 
implemented by circuits of size poly(n) = poly(d). Then Theroem 4.1 will state that there is no 
sanitizer M that takes a database D G ({0, l} d ) poly(d) , runs in poly(d) time, and answers 0(n ) 
queries implemented by circuits of size poly(d), while satisfying differential privacy. 

We now give a sketch of the proof, which follows the approach of Dwork et al. [DNR + 09]: 
For every ciphertext c G C", consider the query q c (x) = Dec(:r, c) and let contain all of 
these queries. Notice that = J~Dec TJ ,d, and assume there is an efficient mechanism is that is 
(Q^ d \kjj(d))-accmate for these queries. The fact that M. is accurate for these queries will imply 
that (after small modifications) M is a £;-rT- useru l pirate decoder (Definition 3.1). 

Now consider two experiments: In the first, we construct an n-row database D by running 
GenxT(l <i ) to obtain n user keys, and putting one in each row of D. Then we run TracejT ° n 
M(D, ■) and obtain a user i. Since Ai is useful, and IItt is secure, we will have that i G [n] with 
probability close to 1, and thus there is an i* G [n] such that i = i* with probability close to 1/n. 

In the second experiment, we construct a database D 1 exactly as in the first, however we exclude 
the key sk^ l *\ Since D and D' differ in only one row, differential privacy requires that TracejT) run 
with oracle Ai(D',-), still outputs i* with probability close to 1/n. However, in this experiment, 
i* , sk^ 1 ' is no longer given to the pirate decoder, and thus security of IItt says that TraceTT> run 
with this oracle, must output i* with probability o(l/n). Thus, we will obtain a contradiction. 



14 



Proof. Let IItt = (Genjj, EncjT) DecjT) TracejT) be the assumed traitor-tracing scheme. For 
every d G N, we define the query set 

Sot TT = {qc(x) = Dec TT (x,c) | c G C^} = F DeCTT , d 

over {0, l} d and let Qoec TT = ^ ^Decyy J" • R ecau that since DecjT runs in time po\y(d,n(d)), (at 

a minimum) Qd2 c C for some s(d) = poly(d, n(d)). 

Assume there exists an efficient, differentially private, (Qoecyy 5 fc"rr)-accurate sanitizer Ai. We 
define the pirate decoder Vm as follows: 

Algorithm 1 The pirate decoder Vm 

Input: A set of user keys (skg) G {0, l} d and a set of ciphertexts ci, . . . , Cfc TT (kjj = kjj(d)). 
Construct circuits specifying the queries q ci , . . . ,Qc kjJ G QdI Ctt - 
Construct a database D = (sk^) ieS G ({0, l} d ) |s| . 
Let ai, . . . ,a kjT 4 r M(D,q Cl ,. . .,q CkjT )- 

Round the answers ai, . . . , afc TT to {0, 1} to obtain bi, . . . , 6fc TT G {0, 1}. 
Output: . A TT . 

Recall that if Ai is efficient, it runs in time poly(d,n(d),kjj(d),s(d)) = poly(d, n(d), kjj(d)). 
In this case Vm runs m time poly(d, n(d), kjj(d)) as well. To see this, observe that since DecjT 
runs in time s(d) = poly(d, n(d)), we can construct a circuit implementing Decjj(sk, c) = q c (sk) 
in time poly(d, n{d)). Additionally, the final rounding step can be performed in time poly^TT^))- 

Next, we claim that if Ai is accurate for QDec-m then Vm is a useful pirate decoder. 

Claim 4.2. If M is (Q, kj j)- accurate for some Q = UdeN 2^ suc ^ ^Dec T — f or ever y 
d G N, then Vm i> s a ^tt -useful pirate decoder. 

Proof of Claim J±.2. Let sk G {0, l} d be a set of user keys for IItt and let S C [n] be a subset of 
the users such that \S\ > n — 1. Suppose c and 6 are such that for every i G S, DecT"r(s&w, c) = b. 
Then we have that, for D as in Vmi 

= T^E^®) = T^EDecTT^W c) = 6 

11 i&S 11 ieS 

Let c\ , . . . , Cfc TT be a set of ciphertexts and let q ci , . . . , q Ck be as in Vm ■ Let 
Ai(D, q Cl , . . . , qc kjJ )- The accuracy of Ai (with constant error a < 1/2) guarantees that 

Pr [3j G [frrr], | 0j - q Cj (D)\ > 1/2] = o(l/\S\ 2 ) 

=► o(l/n 2 ) = Pr [3j G [Att], jo,- - g Cj .(L>)| > 1/2] 

= Pr [3j G [fe TT ],6 G {0, 1}, {{q C] (D) = b) A (|a, - 6| > 1/2))] 

= Pr [3j G [fcrr], b G {0, 1}, {{q Cj (D) = b) A (round(a J ) / 6))] 

= p [ 3i€[fcrr],&€{0,l} 

((V* G 5, Dec TT (^ W ,c j ) = 6) A (p M (4,ci, . . ..c^ + bj 

Thus, Vm is feyr-useful. This completes the proof of the claim. □ 
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Since Vm is a &TT-useful pirate decoder, and Iljj is a (n, fc-n-)-secure traitor-tracing scheme, 
we have that 

Trace T T^ (slv) (sX) G [n(«)]l > 1 - o ' 



Pr 

P^'s, Trace-py's coins 



n(«) 



Thus, for every kGN, there exists G such that, 



Pr 

sfc«— j^Genyy (l^) 
*P_A4 's, Trace-j~j-'s coins 



Trace T T PM(s1v) (s*0 



> 



n(K) 



(1) 



Let 5"(k) = \ Now we claim that if Ai is differentially private, then TracejT wm 

output with significant probability, even Vm is gi ye the set of keys skgu\, rather than sk. 

Claim 4.3. If Ai is differentially private, then 



Pr 

sk< — ^Gen-p-p (l K J 
"P 'j^i 's, Trace-]--]- 's coins 



> 



1 



n(«) 



Proof of Claim 4-3. Fix any /c and let kjj = kjj(n) and i* = S = S(k) as above. Also 

fix any sk, c%, . . . ,Cfc TT , and let q Cl , . . . ,Qc kjJ be the queries constructed in the execution of Vm 

with ci, . . . , Ckjj as input. Let D = sk and D_i* = (skg)- By differential privacy (for e = 0(1), 
5 = o(l/n)), we have that for every T C ]R fcTT 



Pr 



M(D,q cl ,...,q c ,)eT 



< e o(i) . Pr 



M(D_i,g Cl ,...,g CfcTT ) G T 



+ o 



Note that the queries constructed by Vm depends only on c\, . . . , Ck TT , not on sks- Also note that 
the final rounding step does not depend on the input at all. Thus, for every T C {0, l} fcjT 



Pr 



pM(sk,ct, . . . ,c kjT ) G T 



< e O(i) . Pr 



VM{sks,ci, . . .,c kjJ ) G T 



+ o 



n 



(2) 



Now take T = T(sk, c±, . . . , c kjT ) to be the set of responses 6i , . . . , 6fc TT such that Trace-n-(s/c)> 
after querying its oracle on ciphertexts c\ , . . . , Ck TT and receiving responses &i , . . . , bk TT , outputs i* . 
The claim now follows by applying (2) to T, averaging over the randomness of sk, c\, . . . , Cfc TT , and 
finally combining with (1). □ 

To complete the proof, notice that the probability in Claim 4.3 is exactly the probability that 
TracejT outputs the user i* , when given the oracle Vm{ s ^s)i f° r S = [n] \ {i*}- However, the 
fact that Vm is efficient, and FItt is a secure traitor-tracing scheme implies that this probability 
is o(l/n). Thus we have obtained a contradiction. This completes the proof of the Theorem. □ 



5 Constructions of Traitor- Tracing Schemes 

In this section we show how to construct traitor-tracing schemes that satisfy Definition 3.2, and thus 
can be used to instantiate Theorem 4.1. Our construction will make use of an encryption scheme 
(Definition 3.6) and a fingerprinting code (Definition 3.3). First we will state the construction and 
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then prove its security (Section 5.1), using the encryption scheme only as a black box. Then we 
will consider some possible choices for the encryption scheme, and analyze the resulting decryption 
function family for the traitor-tracing scheme (Section 5.2). 

The encryption and decryption functionality of our traitor-tracing scheme, Htt> is a standard 
construction: Let IlEnc be an encryption scheme. To generate keys for IItT) generate n independent 
user keys sk^ l \ . . . , sk^ for IlE nc , and give one to each user. To encrypt a bit b to all of the users, 
encrypt it using IlEnc under each user key sk^ and concatenate the ciphertexts. To decrypt a 
ciphertext using user key sk^\ find the i-th block of the ciphertext and decrypt it using IlEnc an d 
user key sk^ . We give a more formal specification of the construction below. It will be clear from 
the construction that (Genjj, EncjT) Dec-pr) satisfies the expected properties of encryption. 

Algorithm 2 The algorithms (Genjj, EncjT) DecjT) f° r IItt- Let an encryption IlE nc = 
(Gen, Enc, Dec) and a function n: N — > N be parameters of the scheme. 
Gen TT (l K ): 

Let: n = n(n), k = k — [logn] 

For: i = 1, 2, . . . ,n 

^k {i) <- R Gen (I s ) 

Let: sk^ = (z,iife (<) ) 
EndFor 

Output: (skW , . . . , sk^) 

Enc TT (skW,...,sk( n \b): 
For: i = 1, 2, . . . ,n 
Let: (i,^k [i) ) = sk® 

c« <- R Enc(^ W ,6) 
EndFor 

Output: c = (cW,...,c( n )) 

Decjj(sk, c) : 

Let: c= (c( 1 ),...,c( n )) 

Let: (i, sk ) = sk 

Output: b = Dec(sk {l) ,c^) 



Although the construction of the scheme is standard, we will need to show that it can be traced 
in our model using only a small number of non-adaptive queries. First we define a subroutine, 
TrEncjT, that "encrypts a matrix" B G {0, l}" xfe . We will use bj £ {0, l} n to denote the j-th 
column of B, 6® G {0, l} k to denote the i-th. row of B, and bj to denote the (i,j)-th bit of B. 

The algorithm will generate k ciphertexts c± , . . . , cjt . Each ciphertext Cj corresponds to a column 

of B, and each block of the ciphertext Cj contains an encryption using Enc and sk^ of the bit bj . 
Notice that if k = 1 (the matrix has only one column) and every row contains the same bit 6, then 
JrEr\cjj(sk, B) is distributed identically to Enc(sk,b). 

Now we can specify the tracing algorithm for IItt- Let IIpp be a fingerprinting code. First, 
Trace-pr will generate a codebook W for the fingerprinting code and then run TrEncjj(sk,W) to 
obtain its challenge ciphertexts. The, TracejT queries its oracle on these ciphertexts and receives 
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Algorithm 3 The subroutine TrEncjT- 

TrEncrr(sfc (1) ,---,sfc (n) ,5): 

Let: B = (bi, . . . ,bk) be the columns of B. 
For: j = l,...,k 
For: i = 1, . . . , n 
Let: (i,sfc W ) = sfc« 

cf <- Enc(^ki,bf) 
EndFor 

Let: Cj = (cf\...,cf) 
EndFor 
Output: ci, . . . , Ck 



b±, ... , be FP in response. Finally, it treats these responses as a word w' and runs the tracing algorithm 
for the fingerprinting code, Tracepp on w', and repeats the output of Tracepp as its own output. 

Algorithm 4 The tracing algorithm for Ujj. Let a length £fp = £pp(n) fingerprinting code 
IIpp = (Genpp, TraceFp) and an encryption scheme IlEnc = (Gen, Enc, Dec) be parameters of the 

scheme. 

Tracej T (sfc): 

Let: n = n(n), Ifp = £pp(n) 

Let: W <- R GenFp(l™) 

Let: ci, . . . ,q fp i r TrEncjT (sfe, 

Let: 6i,...,&4 P <- r P(ci, . . . ,q fp ) 

Let: z ^— R Tracepp(VF, u;') where io' = . . . ||6£ FP is the concatenation of b±, . . . , bg FP 
Output: i 



5.1 Security of IItt 

In this section we will prove that out construction of IIjj = (Genjj, EncjT) Dec-r~r> TracejT) is an 
(n, ^Fp(n))-secure traitor-tracing scheme. It can be verified from the specification of the scheme 
that it has the desired syntactic properties, that it generates n(n) user keys, and that the tracing 
algorithm makes £pp(n(n)) non-adaptive queries to its oracle. 

Before proving the security of the scheme, we will give some intuition for why a pirate decoder 
V(sks, •) can be successfully traced. The algorithm TracejT is going to generate queries ci, . . . , q fp 
using TrEncTT(s^) W), for a fingerprinting codebook W. By the construction of TrEncjT and the 
correctness of IlEnc, if j S Crit(Ws), then every user i G S will decrypt Cj to the same bit bj = 
If V is a ^Fp-useful pirate decoder, then with high probability it answers the queries with b\, . . . , t>e FP 
such that for every j G Cnt(Ws), bj = bj. Thus w' will be in F(Ws) with high probability. The 
security of the fingerprinting code now implies that i 6 S with high probability. 

The catch in this argument is that TrEncjT takes all of W as input, however an attacker for 
the fingerprinting code is only allowed to see Ws, and thus cannot simulate TrEncjT in a security 
reduction. However, if V only has keys sks, and i S, then an efficient V cannot decrypt the i-th 
block of a ciphertext c = (c^\ . . . , c^). Notice that the ciphertext components c(\ . . . , are 
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exactly those that contain encryptions of the i-th row of W, which is the i-th codeword in W. Thus 
the i-th codeword is computationally hidden from V anyway, and we could replace that codeword 
with the all-zeros codeword without significantly affecting the success probability of V . Formalizing 
this intuition will yield a valid attacker for the fingerprinting code, and obtain a contradiction. 
We now formalize this intuition and prove the following theorem: 

Theorem 5.1 (From Encryption to Traitor- Tracing) . Assume that there exists an (s£ nc , k£ nc , TE nc )- 
secure encryption scheme, IlE nc and an [e pp, I pp)- fingerprinting code, n F p. Let n: N — > N and 
kjj : N — > [0, 1] be any functions such that for every k G N 

1. u(k) ■ £ E nc(«) + e F p(n(/c)) = o 



n(/t) 2 

2. k Enc (K,) > £pp(n(n)) = kjj(K), and 

3. poly (K,n(K),k TT (K)) <T Enc (K- \\og(n(K))] , k TT (k)) 

Then Hjj = (GenjT, EncjT, Dec-m Trace-pr) with parameters n, kjj, liEna n F p is a (n, kjj)-traitor- 
tracing scheme. 

Proof. Suppose there exists a poly(«;, n(«;), fcTT(^))-time pirate decoder V that violates the security 
of ITr-p There is, for every k£N, there exists S = S(k) C [u(k)], \S(k)\ > n(n) — 1, such that 



Trace TT p ( sl sM>-)( s fc) s] = (I ( 

J \n{K)J 



Pr 

sk<r- aGeriTT(l re ) L 

where the probability is also taken over the coins of V and Trace-pr- Thus, for a randomly chosen 
S(k) C [n(n)\ such that \S(k)\ > u{k) - 1, 



Pr 

sfe^ R Gen T T(l' t ),5'(K) 



Tracerr 7 ' (i * s W , ' ) (sfc) <?S 



n 



i 



n(n)' A 



where this probability is also taken over the coins of V and TracejT- We will show that such a 
pirate decoder must either violate the security of the encryption scheme or violate the security of 
the fingerprinting code. 

First we define some notation for specifying the adversaries used in the argument. Let W = 
(w^, . . . , w^) G {0, 1}™ x ^ f p be a codebook for the fingerprinting code, represented as a matrix with 
the i-th row containing the i-th codeword. For S C [n], \S\ = n— 1, where S = [n]\{i}, and codebook 
W, we write W s = (w^, . . . ,w ( - i ~ 1 \uM +1 \ . . .,w^) G {0, i}(«-i)**fp to be the codebook W with 
the i-th row removed. We also write Ws = (w^\ . . . , 0, w( t+1 \ . . . ,w^) G {0, 1}™*^ to be 

the codebook W with the i-th row present but replaced with the all-zeros codeword. Note that Ws 
can be computed only from Ws, even though W cannot. If S = [n] we will use the same notation, 
however in this case notice that W = Ws = Ws- 

Consider the following algorithm A^ p 

Since A^ p is a valid attacker in the fingerprinting security game for every V (it only takes 
(S, Ws) as input), for every S C [n], S ^ 0, 

Pr \w' G F(W S ) A Trace FP (VF, w') S] < e FP (n) = e F p(n(«)) 

W*r- RGenppCl 71 ) 
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Algorithm 5 The fingerprinting security adversary A^ f 



A^ P (S,W S ): 

Let n be the number of users for the fingerprinting code, and choose k such that n(n) = n 

Let: sk = (afcW, afcW) <- a Gen T T(l K ) 

Let: (ci, . . . , q fp ) = TrEnc T T(sA;, Ws) 

Let: (6i,...,64 P ) ="P(sfc s ,ci,...,Q Fp ). 

Output: if/ = 61 1| . . . ||&^ F p, the concatenation of b\, ... , 6^ FP . 



Thus, for randomly chosen 5 = S(k) C [n(«)], jS^ > n(«) — 1, 

Pr [«,' G F(W S ) A Trace FP (^, «/) S] < e FP (n) = e FP (n(«)) (3) 

W<— R Gen F p(l n ),S 
™'^R-4^p(-S,W s ) 

The next claim states that if V is a ^FP-useful pirate decoder, then w 1 G i^(W5) with high 
probability. 

Claim 5.2. Let fcjT = ^tt(k) = £fp(n(K)) for every k G N. If V is a kjj-useful pirate decoder, 
then for every n G N, S = S(k) C [n(«)] suc/i i/icrf |5| > n(/c) - 1, and W G {0, 1}" x ^p(") f/ or 
n = n(n) ) 

Pr [«/ F(W S )] = o ( -^—] 

Proof of Claim 5.2. If V is fcjT-useful, then, in particular, for any sk = (sk^\ . . . ,sk^- n '), any 
S C [n] such that |5| > n — 1, any ciphertexts c\, . . . , Cfc TT , and 61, ... , 6fc TT <— R V(sks, ci, . . . , Cfc TT ). 



Pr 



3j G [fen-], 6 G {0, 1} ((v» G 5, Dec TT (^ W , c,-) = ft) A (fy ^ 6))] = o (^2) ( 4 ) 



Consider any j G Crit(Ws). There exists bj such that for every i G 5, itfj = fry. Let ci, . . . , q fp <— R 

TrEncTT(s^ 5 W 7 ^) and Cj = (cj- , . . . , ), Cj\ By construction of TrEncjT) for every i G 5, is 

distributed as En c(sk^ , = Enc( 8k®,bj). Thus, by the correctness of IlEnc, for every i G S, 

Dec(sk( i \cj) = bj, and by the construction of ITrT> for every % G S, DecjT {sk^, Cj) = bj. Thus, 
by (4), 

Pr \w' G* F(Ws)) 

u>'+-nA£ p (S,W s ) 

= Pr [3j G Grit(Ws),i G S s.t. bj / bj] 



Pr 



3j G [fcrr], ((Vi e S, Dec rr (sk^, Cj ) = b^j A (bj + bj 



n(«;)' 



where the latter two probabilities are taken over the random choice of sk R Genjx(l K ), the 
randomness of the encryptions ci, . . . , Cfc TT <— R TrEnc-rT(sfe, Wg), and the randomness of the pirate 
b\, ... , 6fc TT ^ — R V(sks, c%, . . . , Cfc TT ). This completes the proof of the claim. □ 
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Combining this claim with (3) we obtain 

e FP (n(«)) > Pr [w' e F(W S ) A Trace FP (W, «/) 5] 

W<- R Genpp(l n ),S L J 

> Pr [Trace FP (W>') £ S] - o ( -^—) (5) 

W^ R Genpp(l™),S \n(K) 2 J 

Now consider the algorithm ^^(S 1 , W), which runs exactly as ^4^,(5, Ws) but encrypts the 
matrix W instead of Ws- From the construction of Htt ; it can be verified that for every S C 
[n],\S\>n-l, 



Pr 



Jrace TT v(sks '\sk) g s] < n • Pr [Trace T T 7 ' ( ^ s '' ) (s^) S 



5 



<n- Pr \jrace FP (W,w') £ S] . 

W<- R Gen Fp (l"),S L 

Thus, in order to complete the proof, it will be sufficient to prove the following claim. 

Claim 5.3. If Heuc is (sEnc, ^Enc, TEnc)-secure for ^Eno Tehc as in the statement of the Theorem, 
then for every poly(K, ti(k), kjT(n)) pirate decoder V , 



Pr [TraceppfW,^) 4 S] - Pr \Trace FP (W,v/) & S] 



< £Enc(«) 



Proof of Claim 5.3. Let IlEnc = (Gen, Enc, Dec) be the encryption scheme. Consider a random set 
of ciphertexts c[ \ . . . , Cg for £ F p = £ F p(n{K)) and for i to be chosen later. The ciphertexts can be 
generated in one of two ways: In either case, generate a random key sfcW 4 — R Gen(l K ). 

• In the first case, which we call the "all-zeros case" , q i — R Enc(sA;^\ 0), . . . , cj? ^ — R Enc(s/j^, 0). 



In the second case, which we call the "codeword case", there is a codeword t«W G {0, 1}^ FP 
and cy R Enc(sfcW,to^), . . . ,cf i — R Enc(sfcW,ii7^ ). 



Now consider the following algorithm We prove the claim by the following series of observation 
about AE nc : 1) The challenge ciphertexts c\ , . . . , c^. are distributed either as the all-zeros case 
or the codeword described above. 2) Since Gen-pr generates n independent keys from Gen, 

and in either case of the challenge ciphertexts a key for Gen is chosen independently from Gen, 
all the keys used in ^.Enc have exactly the same distribution as sk 4 — R GenjT- 3) In the all-zeros 
case, the ciphertexts c\, . . . ,q fp are distributed exactly as the ciphertexts in App(S, Ws), and in 
the codeword case, the ciphertexts are distributed exactly as the ciphertexts in A(S, W). 4) If V 
runs in poly(K, n(n), kjj(K))-time, then so does A^na thus, by the condition on TE nc > Aeuc is a 
valid attacker for the encryption scheme. 

From these observation, we conclude that if the claim is false, then ^Enc violates the security 

ofHEnc. □ 

We now complete the proof of the theorem by combining Equation (5) and Claim 5.3. 

□ 
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Algorithm 6 The encryption adversary A^ nc 



^nc(l K ) 

Let: n = n(n), £fp = ^fp(^), i ^— r [n], let 5 = [n] \ {i} 
Let: choose n — 1 userkeys sks <— r Gen(l K ), independently 
Let: W i — r GenFp(l n ) 

Request encryptions for either message G {0, 1} £fp or for ioW from n Enc 



Receive a sequence of challenge ciphertexts , . . . , c^ p 
Construct a sequence of ciphertexts cj, . . . , q mimicking TrEncjT as follows: 



c i ; • • • > c i P ) TrEnc T T(sA;_i, W_i) 



For every j G [^fp] j Cj contains in the i-th block and Cj ^ in all other blocks 

Let: bi, . . ._, be FP <-r V(sk s , ci, . . . , q fp ) 

Let: u/ = 6i|| . . . ||6^ FP 

Let: ? <— R TraceFp(VF, w') 

Output: 1 if Trace^p (W,w') G" S, otherwise. 



5.2 Decryption Function Family of I1 TT 

In this section we consider the complexity of the decryption function used by our traitor-tracing 
scheme IIjj (Section 5.1). Since IIjx uses an encryption scheme IlE nc as a building block, its 
complexity will depend on the complexity of the underlying encryption scheme. The following 
simple lemma gives a description of the decryption function of Ujj as a circuit with gates computing 
the decryption function for IlEnc 

Lemma 5.4 (Decryption Function Family for IItt)' Let lijj be as defined, with IlEnc a s its 
underlying encryption scheme. Let (i,sk) = sk G {0, 1} K be any user key for lijj and let c = 
(c^\ . . . ,c( n )) G be any ciphertext (for security parameter k). Then 

Dec TT , c (sk) = Dec TTiC (i,sk) = \J A Dec c( ,/ } (sk)) 

i'e[n] 

Here, the function l x (y) takes the value 1 if y = x and otherwise. The truth of the lemma can 
easily be verified from the construction of DecjT- Also note that the function lj/ : {0, l}r io &™l — > 
{0, 1} is just a conjunction of [logn] bits, and we need to compute n of these functions. In addition 
to computing lj/ and Dec c (j/), there are n conjunctions and a single outer disjunction. Thus we add 
an additional n + 1 gates and increase the depth by 2. Hence, an intuitive summary of the lemma 
is that if Dec can be implemented by circuits of size s and depth h, DecjT can be implemented by 
circuits of size s + 0(n) and depth h + 2. This summary will be precise enough to state our main 
results. 

By combining the preceding Lemma with Theorem 5.1, we can obtain the following corollary. 

Corollary 5.5 (One-way Functions Imply traitor-tracing w/ Poly-Time Decryption). Let n = n(n) 
be any polynomial in k. Assuming the existence of one-way functions, there exists an (n, (J(n 2 ))- 
secure traitor-tracing scheme with decryption Foec 1T ,d C for some function t = t(d) = poly(d) 
and every d G N. 
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Proof. The existence of one-way functions implies the existence of an encryption scheme IlE nc that 
is (l/n a , K a , K a )-secure for every constant a > with decryption function Toe^d Q Q[i f° r some 
t' = t'(n) = poly(/c) and every d G N. From Lemma 5.4, it is easy to see that if IIjj uses IlE nc as 
its encryption scheme, then J-bec TT ,d ^ Q-i f° r = + 0{n{d)) = poly(d). □ 

Theorem 1.1 in the introduction follows by combining Theorem 4.1 with Corollary 5.5. 

We will now consider the possibility of constructing a traitor-tracing scheme where the decryp- 
tion functionality can be implemented by circuits of constant depth, and thus obtaining hardness 
results for generic sanitizers that are efficient for constant-depth queries (Theorem 1.2). We will 
give a candidate for such a scheme using the notion of local pseudorandom generators. 

Definition 5.6 (Local Pseudorandom Generator). An efficient algorithm G : {0, 1} K — > {0, 1} s prg( k ) 
is a (epRG, spRc)-pseudorandom generator if for every poly(spRG(K))-time adversary ^4prg 

|Pr [A PRG (G(U K )) = 1] - Pr [-4 P rg(^ prg(k) ) = l] | < £prg(^) 

If, in addition, there exists L G N such that for every k G N, i G [sprg(^)], there exists Vi = 
{vi,i, • • • , v i}L } C [k] and gi : {0, 1} L -)■ {0, 1}, such that 

G(s)i = gi(s Vi l , s Vi 2 , . . . , s„ i 2j ) 

then G is a (sprg, sprg, L)-local pseudorandom generator. 

It is a well known result in Cryptography that pseudorandom generators imply encryption 
schemes satisfying Definition 3.6 (for certain ranges of parameters). We will use a particular 
construction whose decryption can be computed in constant-depth whenever the underlying PRG 
is locally-computable (or, more generally, computable by constant-depth circuits). 

Lemma 5.7 (Local PRGs — > Encryption). If there exists a (eprg(k), sprg(k), L) -local pseudoran- 
dom generator G, then there exists an (e^nc = ^PRGi^Enc = \/~spRo{i^j)- Secure Encryption Scheme 
(Gen, Enc, Dec) with J-bec,d Q Qtl f or some t = t(d) = poly(d) and every d G N. 

The construction is the standard "computational one-time pad" , however we give a construction 
to verify that the decryption can be computed by constant-depth circuits. 

Proof. We construct the scheme as follows. Let G: {0, 1} K —> {0, l} SpRG ( K ) be a pseudorandom 
generator. Suppose for every k G N, every bit of G's output is computable from L bits of its seed. 
That is, for every i G {1, 2, . . . , sprg(^)} there is a function g^ : {0, 1} L — > {0, 1}, and a set Vi C [k], 
\Vi\ < L such that G(s)j = ^(syj. Here sy ; is the restriction of s to the indices in Vi. Let lj(j) be 
the indicator variable for the condition j = i. Then for every c = (r, 6) G C, we can write 

Dec (r)6) (s) = \/ U (r) A (sy ) © b) . 
*e[«PRG(«)] 

Observe that, since 5j is a function of £ bits of the input, it can be computed by a size-2 L , depth- 
2 circuit, thus 5i(syJ © 6 can be computed by a size 2 L + 1, depth-3 circuit. The indicator lj 
can be computed by a conjunction of [log 2 sprg(k)] bits, which is a size-[~log 2 sprg(k)] , depth-1 
circuit. The outer disjunction increases the depth by one level and the size by 1. Putting it all 
together, we have shown that Dec r) &(s) can be computed by depth-4 circuits of size 0(2 l sprg(k)) = 
poly(s PRG (K)). □ 
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Algorithm 7 An encryption scheme ITemc that can be decrypted in constant depth. 



Gen(l K ): 

s^r{0,1} k 
Output: sk = s 

Enc(sk, b) : 

r i p {1,2, . . . ,s prg (k)} 
Output: c = (r, G(sk) r © b) 

Dec(sk, c) : 
(r',b') = c 

Output: b = G(sk) r © b' 



Combining Theorem 5.1 with Lemmas 5.4 and 5.7 yields the following corollary. 

Corollary 5.8 (Local Pseudorandom Generators Imply traitor-tracing w/ AC Decryption). Let 

n = u(k) be any polynomial in k. Assuming the existence of a (o(l/n 2 ), n 5 ,L)-local pseudorandom 
generator for some constant L £ N, there exists an (n,0(n 2 ))-secure traitor-tracing scheme with 
decryption function family J~Decjj ,d f= Qte f or some t = t(d) = poly(d) and every d € N. 

Proof. By Lemma 5.7, the assumed pseudorandom generator implies an encryption scheme with 
•^bec.d ^= Qf '4 f° r some t' = t'(n) = poly(fi) and every d G N. From Lemma 5.4, it is easy to see 

that if n-rT uses ITehc as its encryption scheme, then ^DecjT.d C for t(d) = t'(d) + 0(n(d)) = 
poly (d) . ' □ 

To support the plausibility of the assumption, we remark that a recent result of Applebaum [Appl2] 
gives a candidate construction of a local pseudorandom generators (in the range of parameters we 
require). We refer the reader to [Appl2] for more discussion of the plausibility of this assumption. 

Theorem 1.2 in the introduction follows by combining Theorem 4.1 with Corollary 5.8. 
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